Efficient Strategies for Calculating Blockwise Likelihoods Under the Coalescent.

نویسندگان

  • Konrad Lohse
  • Martin Chmelik
  • Simon H Martin
  • Nicholas H Barton
چکیده

The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strategies for calculating blockwise likelihoods under the coalescent

The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proven difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from con...

متن کامل

Coalescent: an open-science framework for importance sampling in coalescent theory

Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schem...

متن کامل

A general method for calculating likelihoods under the coalescent process.

Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the gen...

متن کامل

Exact coalescent likelihoods for unlinked markers in finite-sites mutation models

We derive exact formulae for the allele frequency spectrum under the coalescent with mutation, conditioned on allele counts at some fixed time in the past. We consider unlinked biallelic markers mutating according to a finite sites, or infinite sites, model. This work extends the coalescent theory of unlinked biallelic markers, enabling fast computations of allele frequency spectra in multiple ...

متن کامل

Maximum-likelihood estimation of coalescence times in genealogical trees.

We develop a method for maximum-likelihood estimation of coalescence times in genealogical trees, based on population genetics data. For this purpose, a Viterbi-type algorithm is constructed to maximize the joint likelihood of the coalescence times. Marginal confidence intervals for the coalescence times based on the profile likelihoods are also computed. Our method of finding MLEs and calculat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetics

دوره 202 2  شماره 

صفحات  -

تاریخ انتشار 2016